Introduction
In my vectorization using .NET APIs blog, I describe SIMD datatypes Vector64<T>
and Vector128<T>
that operates on ‘Arm64 hardware intrinsic’ APIs present under System.Runtime.Intrinsics.Arm.AdvSimd and System.Runtime.Intrinsics.Arm.AdvSimd.Arm64 class. In this post I will describe those hardware intrinsic APIs by showing sample code usage along with examples and generated Arm64 code. This will help people in understanding these APIs so they can use them to optimize their .NET code written to target Arm64. Since there are 360 APIs, describing all of them in a single post will be overwhelming. So I have divided these APIs among 8 blogs and will demonstrate 45 APIs in each blog. This is part 3 of that blog series. You can checkout my previous blogs at:
Most of the description of these APIs is adapted and referenced from Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile document. You can also refer to the description of SIMD and Floating-point instructions description at Arm developer docs page.
The blog page is programmatically generated and might contain mistakes. If you find any mistake, please leave a comment and I will address it.
APIs covered
1. ConvertToUInt32RoundToPositiveInfinityScalar
Vector64<uint> ConvertToUInt32RoundToPositiveInfinityScalar(Vector64<float> value)
This method converts each element in the value
vector from a floating-point to an unsigned integer value using the Round towards Plus Infinity rounding mode, stores in the result vector and returns the result vector.
private Vector64<uint> ConvertToUInt32RoundToPositiveInfinityScalarTest(Vector64<float> value)
{
return AdvSimd.ConvertToUInt32RoundToPositiveInfinityScalar(value);
}
// value = <11.5, 12.5>
// Result = <12, 0>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ConvertToUInt32RoundToPositiveInfinityScalarTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[UInt32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fcvtpu s16, s0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
2. ConvertToUInt32RoundToZero
Vector64<uint> ConvertToUInt32RoundToZero(Vector64<float> value)
This method converts each element in the value
vector from a floating-point to an unsigned integer value using the Round to Nearest with toward zero rounding mode, stores in the result vector and returns the result vector.
private Vector64<uint> ConvertToUInt32RoundToZeroTest(Vector64<float> value)
{
return AdvSimd.ConvertToUInt32RoundToZero(value);
}
// value = <11.5, 12.5>
// Result = <11, 12>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<uint> ConvertToUInt32RoundToZero(Vector128<float> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ConvertToUInt32RoundToZeroTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[UInt32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fcvtzu v16.2s, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
3. ConvertToUInt32RoundToZeroScalar
Vector64<uint> ConvertToUInt32RoundToZeroScalar(Vector64<float> value)
This method converts each element in the value
vector from a floating-point to an unsigned integer value using the Round to Nearest with toward zero rounding mode, stores in the result vector and returns the result vector.
private Vector64<uint> ConvertToUInt32RoundToZeroScalarTest(Vector64<float> value)
{
return AdvSimd.ConvertToUInt32RoundToZeroScalar(value);
}
// value = <11.5, 12.5>
// Result = <11, 0>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ConvertToUInt32RoundToZeroScalarTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[UInt32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fcvtzu s16, s0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
4. ConvertToUInt64RoundAwayFromZero
Vector128<ulong> ConvertToUInt64RoundAwayFromZero(Vector128<double> value)
This method converts each element in the value
vector from a floating-point to a 64-bits unsigned integer value using the Round to Nearest with Ties to Away rounding mode, stores in the result vector and returns the result vector.
private Vector128<ulong> ConvertToUInt64RoundAwayFromZeroTest(Vector128<double> value)
{
return AdvSimd.Arm64.ConvertToUInt64RoundAwayFromZero(value);
}
// value = <11.5, 12.5>
// Result = <12, 13>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ConvertToUInt64RoundAwayFromZeroTest(System.Runtime.Intrinsics.Vector128`1[Double]):System.Runtime.Intrinsics.Vector128`1[UInt64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fcvtau v16.2d, v0.2d
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
5. ConvertToUInt64RoundAwayFromZeroScalar
Vector64<ulong> ConvertToUInt64RoundAwayFromZeroScalar(Vector64<double> value)
This method converts each element in the value
vector from a floating-point to a 64-bits unsigned integer value using the Round to Nearest with Ties to Away rounding mode, stores in the result vector and returns the result vector.
private Vector64<ulong> ConvertToUInt64RoundAwayFromZeroScalarTest(Vector64<double> value)
{
return AdvSimd.Arm64.ConvertToUInt64RoundAwayFromZeroScalar(value);
}
// value = <11.5>
// Result = <12>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ConvertToUInt64RoundAwayFromZeroScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[UInt64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fcvtau d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
6. ConvertToUInt64RoundToEven
Vector128<ulong> ConvertToUInt64RoundToEven(Vector128<double> value)
This method converts each element in the value
vector from a floating-point to a 64-bits unsigned integer value using the Round to Nearest rounding mode, stores in the result vector and returns the result vector.
private Vector128<ulong> ConvertToUInt64RoundToEvenTest(Vector128<double> value)
{
return AdvSimd.Arm64.ConvertToUInt64RoundToEven(value);
}
// value = <11.5, 12.5>
// Result = <12, 12>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ConvertToUInt64RoundToEvenTest(System.Runtime.Intrinsics.Vector128`1[Double]):System.Runtime.Intrinsics.Vector128`1[UInt64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fcvtnu v16.2d, v0.2d
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
7. ConvertToUInt64RoundToEvenScalar
Vector64<ulong> ConvertToUInt64RoundToEvenScalar(Vector64<double> value)
This method converts each element in the value
vector from a floating-point to a 64-bits unsigned integer value using the Round to Nearest rounding mode, stores in the result vector and returns the result vector.
private Vector64<ulong> ConvertToUInt64RoundToEvenScalarTest(Vector64<double> value)
{
return AdvSimd.Arm64.ConvertToUInt64RoundToEvenScalar(value);
}
// value = <11.5>
// Result = <12>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ConvertToUInt64RoundToEvenScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[UInt64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fcvtnu d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
8. ConvertToUInt64RoundToNegativeInfinity
Vector128<ulong> ConvertToUInt64RoundToNegativeInfinity(Vector128<double> value)
This method converts each element in the value
vector from a floating-point value to a 64-bits unsigned integer value using the Round towards Minus Infinity rounding mode, and returns the result.
private Vector128<ulong> ConvertToUInt64RoundToNegativeInfinityTest(Vector128<double> value)
{
return AdvSimd.Arm64.ConvertToUInt64RoundToNegativeInfinity(value);
}
// value = <11.5, 12.5>
// Result = <11, 12>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ConvertToUInt64RoundToNegativeInfinityTest(System.Runtime.Intrinsics.Vector128`1[Double]):System.Runtime.Intrinsics.Vector128`1[UInt64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fcvtmu v16.2d, v0.2d
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
9. ConvertToUInt64RoundToNegativeInfinityScalar
Vector64<ulong> ConvertToUInt64RoundToNegativeInfinityScalar(Vector64<double> value)
This method converts each element in the value
vector from a floating-point value to a 64-bits unsigned integer value using the Round towards Minus Infinity rounding mode, and returns the result.
private Vector64<ulong> ConvertToUInt64RoundToNegativeInfinityScalarTest(Vector64<double> value)
{
return AdvSimd.Arm64.ConvertToUInt64RoundToNegativeInfinityScalar(value);
}
// value = <11.5>
// Result = <11>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ConvertToUInt64RoundToNegativeInfinityScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[UInt64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fcvtmu d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
10. ConvertToUInt64RoundToPositiveInfinity
Vector128<ulong> ConvertToUInt64RoundToPositiveInfinity(Vector128<double> value)
This method converts each element in the value
vector from a floating-point value to a 64-bits unsigned integer value using the Round towards Plus Infinity rounding mode, and returns the result.
private Vector128<ulong> ConvertToUInt64RoundToPositiveInfinityTest(Vector128<double> value)
{
return AdvSimd.Arm64.ConvertToUInt64RoundToPositiveInfinity(value);
}
// value = <11.5, 12.5>
// Result = <12, 13>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ConvertToUInt64RoundToPositiveInfinityTest(System.Runtime.Intrinsics.Vector128`1[Double]):System.Runtime.Intrinsics.Vector128`1[UInt64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fcvtpu v16.2d, v0.2d
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
11. ConvertToUInt64RoundToPositiveInfinityScalar
Vector64<ulong> ConvertToUInt64RoundToPositiveInfinityScalar(Vector64<double> value)
This method converts each element in the value
vector from a floating-point value to a 64-bits unsigned integer value using the Round towards Plus Infinity rounding mode, and returns the result.
private Vector64<ulong> ConvertToUInt64RoundToPositiveInfinityScalarTest(Vector64<double> value)
{
return AdvSimd.Arm64.ConvertToUInt64RoundToPositiveInfinityScalar(value);
}
// value = <11.5>
// Result = <12>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ConvertToUInt64RoundToPositiveInfinityScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[UInt64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fcvtpu d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
12. ConvertToUInt64RoundToZero
Vector128<ulong> ConvertToUInt64RoundToZero(Vector128<double> value)
This method converts each element in the value
vector from a floating-point value to a 64-bits unsigned integer value using the Round towards Zero rounding mode, and returns the result.
private Vector128<ulong> ConvertToUInt64RoundToZeroTest(Vector128<double> value)
{
return AdvSimd.Arm64.ConvertToUInt64RoundToZero(value);
}
// value = <11.5, 12.5>
// Result = <11, 12>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ConvertToUInt64RoundToZeroTest(System.Runtime.Intrinsics.Vector128`1[Double]):System.Runtime.Intrinsics.Vector128`1[UInt64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fcvtzu v16.2d, v0.2d
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
13. ConvertToUInt64RoundToZeroScalar
Vector64<ulong> ConvertToUInt64RoundToZeroScalar(Vector64<double> value)
This method converts each element in the value
vector from a floating-point value to a 64-bits unsigned integer value using the Round towards Zero rounding mode, and returns the result.
private Vector64<ulong> ConvertToUInt64RoundToZeroScalarTest(Vector64<double> value)
{
return AdvSimd.Arm64.ConvertToUInt64RoundToZeroScalar(value);
}
// value = <11.5>
// Result = <11>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ConvertToUInt64RoundToZeroScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[UInt64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fcvtzu d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
14. Divide
Vector64<float> Divide(Vector64<float> left, Vector64<float> right)
This method divides the corresponding floating-point values in the left
vector, by those in the right
vector, stores the result in a result vector, and returns the result vector.
private Vector64<float> DivideTest(Vector64<float> left, Vector64<float> right)
{
return AdvSimd.Arm64.Divide(left, right);
}
// left = <11.5, 12.5>
// right = <21.5, 22.5>
// Result = <0.53488374, 0.5555556>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> Divide(Vector128<double> left, Vector128<double> right)
Vector128<float> Divide(Vector128<float> left, Vector128<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:DivideTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fdiv v16.2s, v0.2s, v1.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
15. DivideScalar
Vector64<double> DivideScalar(Vector64<double> left, Vector64<double> right)
This method divides the corresponding floating-point values in the left
vector, by those in the right
vector, stores the result in a result vector, and returns the result vector.
private Vector64<double> DivideScalarTest(Vector64<double> left, Vector64<double> right)
{
return AdvSimd.DivideScalar(left, right);
}
// left = <11>
// right = <3.1>
// Result = <3.5483873>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> DivideScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:DivideScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fdiv d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
16. DuplicateSelectedScalarToVector128
Vector128<byte> DuplicateSelectedScalarToVector128(Vector64<byte> value, byte index)
This method creates a vector by duplicating the vector element at index in
value vector into each element of the result vector. As seen in below example, the result vector elements count
Vector128
private Vector128<byte> DuplicateSelectedScalarToVector128Test(Vector64<byte> value, byte index)
{
return AdvSimd.DuplicateSelectedScalarToVector128(value, 3);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// index = 3
// Result = <14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<short> DuplicateSelectedScalarToVector128(Vector64<short> value, byte index)
Vector128<int> DuplicateSelectedScalarToVector128(Vector64<int> value, byte index)
Vector128<float> DuplicateSelectedScalarToVector128(Vector64<float> value, byte index)
Vector128<sbyte> DuplicateSelectedScalarToVector128(Vector64<sbyte> value, byte index)
Vector128<ushort> DuplicateSelectedScalarToVector128(Vector64<ushort> value, byte index)
Vector128<uint> DuplicateSelectedScalarToVector128(Vector64<uint> value, byte index)
Vector128<byte> DuplicateSelectedScalarToVector128(Vector128<byte> value, byte index)
Vector128<short> DuplicateSelectedScalarToVector128(Vector128<short> value, byte index)
Vector128<int> DuplicateSelectedScalarToVector128(Vector128<int> value, byte index)
Vector128<float> DuplicateSelectedScalarToVector128(Vector128<float> value, byte index)
Vector128<sbyte> DuplicateSelectedScalarToVector128(Vector128<sbyte> value, byte index)
Vector128<ushort> DuplicateSelectedScalarToVector128(Vector128<ushort> value, byte index)
Vector128<uint> DuplicateSelectedScalarToVector128(Vector128<uint> value, byte index)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> DuplicateSelectedScalarToVector128(Vector128<double> value, byte index)
Vector128<long> DuplicateSelectedScalarToVector128(Vector128<long> value, byte index)
Vector128<ulong> DuplicateSelectedScalarToVector128(Vector128<ulong> value, byte index)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:DuplicateSelectedScalarToVector128Test(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector128`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
dup v16.16b, v0.b[3]
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
17. DuplicateSelectedScalarToVector64
Vector64<byte> DuplicateSelectedScalarToVector64(Vector64<byte> value, byte index)
This method creates a vector by duplicating the vector element at index
in value
vector into each element of the result vector.
private Vector64<byte> DuplicateSelectedScalarToVector64Test(Vector64<byte> value, byte index)
{
return AdvSimd.DuplicateSelectedScalarToVector64(value, 3);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// index = 3
// Result = <14, 14, 14, 14, 14, 14, 14, 14>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> DuplicateSelectedScalarToVector64(Vector64<short> value, byte index)
Vector64<int> DuplicateSelectedScalarToVector64(Vector64<int> value, byte index)
Vector64<float> DuplicateSelectedScalarToVector64(Vector64<float> value, byte index)
Vector64<sbyte> DuplicateSelectedScalarToVector64(Vector64<sbyte> value, byte index)
Vector64<ushort> DuplicateSelectedScalarToVector64(Vector64<ushort> value, byte index)
Vector64<uint> DuplicateSelectedScalarToVector64(Vector64<uint> value, byte index)
Vector64<byte> DuplicateSelectedScalarToVector64(Vector128<byte> value, byte index)
Vector64<short> DuplicateSelectedScalarToVector64(Vector128<short> value, byte index)
Vector64<int> DuplicateSelectedScalarToVector64(Vector128<int> value, byte index)
Vector64<float> DuplicateSelectedScalarToVector64(Vector128<float> value, byte index)
Vector64<sbyte> DuplicateSelectedScalarToVector64(Vector128<sbyte> value, byte index)
Vector64<ushort> DuplicateSelectedScalarToVector64(Vector128<ushort> value, byte index)
Vector64<uint> DuplicateSelectedScalarToVector64(Vector128<uint> value, byte index)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:DuplicateSelectedScalarToVector64Test(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
dup v16.8b, v0.b[3]
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
18. DuplicateToVector128
Vector128<byte> DuplicateToVector128(byte value)
This method creates a vector by duplicating the value
into each element in the result vector.
private Vector128<byte> DuplicateToVector128Test(byte value)
{
return AdvSimd.DuplicateToVector128(value);
}
// value = 7
// Result = <7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<short> DuplicateToVector128(short value)
Vector128<int> DuplicateToVector128(int value)
Vector128<sbyte> DuplicateToVector128(sbyte value)
Vector128<float> DuplicateToVector128(float value)
Vector128<ushort> DuplicateToVector128(ushort value)
Vector128<uint> DuplicateToVector128(uint value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> DuplicateToVector128(double value)
Vector128<long> DuplicateToVector128(long value)
Vector128<ulong> DuplicateToVector128(ulong value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:DuplicateToVector128Test(ubyte):System.Runtime.Intrinsics.Vector128`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) ubyte -> x0
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uxtb w0, w0
dup v16.16b, w0
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 28, prolog size 8
19. DuplicateToVector64
Vector64<byte> DuplicateToVector64(byte value)
This method creates a vector by duplicating the value
into each element in the result vector.
private Vector64<byte> DuplicateToVector64Test(byte value)
{
return AdvSimd.DuplicateToVector64(value);
}
// value = 5
// Result = <5, 5, 5, 5, 5, 5, 5, 5>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> DuplicateToVector64(short value)
Vector64<int> DuplicateToVector64(int value)
Vector64<sbyte> DuplicateToVector64(sbyte value)
Vector64<float> DuplicateToVector64(float value)
Vector64<ushort> DuplicateToVector64(ushort value)
Vector64<uint> DuplicateToVector64(uint value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:DuplicateToVector64Test(ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) ubyte -> x0
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uxtb w0, w0
dup v16.8b, w0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 28, prolog size 8
20. Extract
byte Extract(Vector64<byte> vector, byte index)
This method extracts an element from vector
at index
and returns it.
private byte ExtractTest(Vector64<byte> vector, byte index)
{
return AdvSimd.Extract(vector, 3);
}
// vector = <11, 12, 13, 14, 15, 16, 17, 18>
// index = 3
// Result = 14
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
short Extract(Vector64<short> vector, byte index)
int Extract(Vector64<int> vector, byte index)
sbyte Extract(Vector64<sbyte> vector, byte index)
float Extract(Vector64<float> vector, byte index)
ushort Extract(Vector64<ushort> vector, byte index)
uint Extract(Vector64<uint> vector, byte index)
byte Extract(Vector128<byte> vector, byte index)
double Extract(Vector128<double> vector, byte index)
short Extract(Vector128<short> vector, byte index)
int Extract(Vector128<int> vector, byte index)
long Extract(Vector128<long> vector, byte index)
sbyte Extract(Vector128<sbyte> vector, byte index)
float Extract(Vector128<float> vector, byte index)
ushort Extract(Vector128<ushort> vector, byte index)
uint Extract(Vector128<uint> vector, byte index)
ulong Extract(Vector128<ulong> vector, byte index)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ExtractTest(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):ubyte
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
umov w0, v0.b[3]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
21. ExtractNarrowingLower
Vector64<byte> ExtractNarrowingLower(Vector128<ushort> value)
This method narrows each element in the value
vector to half the original width, stores the result into a result vector and returns the vector. As seen in below example, the result vector element’s size byte
is half as long as that of input parameter element’s size ushort
.
private Vector64<byte> ExtractNarrowingLowerTest(Vector128<ushort> value)
{
return AdvSimd.ExtractNarrowingLower(value);
}
// value = <300, 12, 413, 514, 15, 216, 117, 618>
// Result = <44, 12, 157, 2, 15, 216, 117, 106>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ExtractNarrowingLower(Vector128<int> value)
Vector64<int> ExtractNarrowingLower(Vector128<long> value)
Vector64<sbyte> ExtractNarrowingLower(Vector128<short> value)
Vector64<ushort> ExtractNarrowingLower(Vector128<uint> value)
Vector64<uint> ExtractNarrowingLower(Vector128<ulong> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ExtractNarrowingLowerTest(System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
xtn v16.8b, v0.8h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
22. ExtractNarrowingSaturateLower
Vector64<byte> ExtractNarrowingSaturateLower(Vector128<ushort> value)
This method saturates each element in the value
vector to half the original width, stores the result into a result vector, and returns the result vector.
private Vector64<byte> ExtractNarrowingSaturateLowerTest(Vector128<ushort> value)
{
return AdvSimd.ExtractNarrowingSaturateLower(value);
}
// value = <300, 12, 413, 514, 15, 216, 117, 618>
// Result = <255, 12, 255, 255, 15, 216, 117, 255>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ExtractNarrowingSaturateLower(Vector128<int> value)
Vector64<int> ExtractNarrowingSaturateLower(Vector128<long> value)
Vector64<sbyte> ExtractNarrowingSaturateLower(Vector128<short> value)
Vector64<ushort> ExtractNarrowingSaturateLower(Vector128<uint> value)
Vector64<uint> ExtractNarrowingSaturateLower(Vector128<ulong> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ExtractNarrowingSaturateLowerTest(System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uqxtn v16.8b, v0.8h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
23. ExtractNarrowingSaturateScalar
Vector64<byte> ExtractNarrowingSaturateScalar(Vector64<ushort> value)
This method saturates 0th element in the value
vector to half the original width, stores the result into a result vector, and returns the result vector. Other elements except 0th element are initialized to 0.
private Vector64<byte> ExtractNarrowingSaturateScalarTest(Vector64<ushort> value)
{
return AdvSimd.Arm64.ExtractNarrowingSaturateScalar(value);
}
// value = <500, 500, 500, 500>
// Result = <255, 0, 0, 0, 0, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<short> ExtractNarrowingSaturateScalar(Vector64<int> value)
Vector64<int> ExtractNarrowingSaturateScalar(Vector64<long> value)
Vector64<sbyte> ExtractNarrowingSaturateScalar(Vector64<short> value)
Vector64<ushort> ExtractNarrowingSaturateScalar(Vector64<uint> value)
Vector64<uint> ExtractNarrowingSaturateScalar(Vector64<ulong> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ExtractNarrowingSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[UInt16]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uqxtn b16, h0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
24. ExtractNarrowingSaturateUnsignedLower
Vector64<byte> ExtractNarrowingSaturateUnsignedLower(Vector128<short> value)
This method saturates each element (which is always signed integer value) in the value
vector to an unsigned integer value that is half the original width, stores the result in a result vector, and returns the result vector. As seen in below example, the result vector element’s size byte
is half as long as the input parameter value
’s element’s size short
.
private Vector64<byte> ExtractNarrowingSaturateUnsignedLowerTest(Vector128<short> value)
{
return AdvSimd.ExtractNarrowingSaturateUnsignedLower(value);
}
// value = <-300, -12, 413, 514, 15, 216, 117, 618>
// Result = <0, 0, 255, 255, 15, 216, 117, 255>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ushort> ExtractNarrowingSaturateUnsignedLower(Vector128<int> value)
Vector64<uint> ExtractNarrowingSaturateUnsignedLower(Vector128<long> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ExtractNarrowingSaturateUnsignedLowerTest(System.Runtime.Intrinsics.Vector128`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqxtun v16.8b, v0.8h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
25. ExtractNarrowingSaturateUnsignedScalar
Vector64<byte> ExtractNarrowingSaturateUnsignedScalar(Vector64<short> value)
This method saturates 0th element (which is always signed integer value) in the value
vector to an unsigned integer value that is half the original width, stores the result in a result vector, and returns the result vector. As seen in below example, the result vector element’s size byte
is half as long as the input parameter value
’s element’s size short
. All the other elements of result vector except 0th element is initialized to 0.
private Vector64<byte> ExtractNarrowingSaturateUnsignedScalarTest(Vector64<short> value)
{
return AdvSimd.Arm64.ExtractNarrowingSaturateUnsignedScalar(value);
}
// value = <11, 12, 13, 14>
// Result = <11, 0, 0, 0, 0, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<ushort> ExtractNarrowingSaturateUnsignedScalar(Vector64<int> value)
Vector64<uint> ExtractNarrowingSaturateUnsignedScalar(Vector64<long> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ExtractNarrowingSaturateUnsignedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqxtun b16, h0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
26. ExtractNarrowingSaturateUnsignedUpper
Vector128<byte> ExtractNarrowingSaturateUnsignedUpper(Vector64<byte> lower, Vector128<short> value)
This method saturates each element (which is always signed integer value) in the upper half of value
vector to an unsigned integer value that is half the original width, stores the result in the upper-half of result vector, and returns the result vector, the lower-half of the result vector contains values from lower
vector. As seen in below example, the result vector element’s size byte
is half as long as the input parameter value
’s element’s size short
.
private Vector128<byte> ExtractNarrowingSaturateUnsignedUpperTest(Vector64<byte> lower, Vector128<short> value)
{
return AdvSimd.ExtractNarrowingSaturateUnsignedUpper(lower, value);
}
// lower = <125, 12, 13, 14, 15, 216, 117, 18>
// value = <-500, 500, 12, 14, 257, 16, 17, 18>
// Result = <125, 12, 13, 14, 15, 216, 117, 18, 0, 255, 12, 14, 255, 16, 17, 18>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<ushort> ExtractNarrowingSaturateUnsignedUpper(Vector64<ushort> lower, Vector128<int> value)
Vector128<uint> ExtractNarrowingSaturateUnsignedUpper(Vector64<uint> lower, Vector128<long> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ExtractNarrowingSaturateUnsignedUpperTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector128`1[Int16]):System.Runtime.Intrinsics.Vector128`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqxtun2 v0.16b, v1.8h
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
27. ExtractNarrowingSaturateUpper
Vector128<byte> ExtractNarrowingSaturateUpper(Vector64<byte> lower, Vector128<ushort> value)
This method saturates each element in the upper-half of value
vector to half the original width, stores the result into the upper-half of result vector, and returns the result vector, the lower half of result vector containing the values from lower
vector.
private Vector128<byte> ExtractNarrowingSaturateUpperTest(Vector64<byte> lower, Vector128<ushort> value)
{
return AdvSimd.ExtractNarrowingSaturateUpper(lower, value);
}
// lower = <125, 12, 13, 14, 15, 216, 117, 18>
// value = <500, 500, 12, 14, 257, 16, 17, 18>
// Result = <125, 12, 13, 14, 15, 216, 117, 18, 255, 255, 12, 14, 255, 16, 17, 18>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<short> ExtractNarrowingSaturateUpper(Vector64<short> lower, Vector128<int> value)
Vector128<int> ExtractNarrowingSaturateUpper(Vector64<int> lower, Vector128<long> value)
Vector128<sbyte> ExtractNarrowingSaturateUpper(Vector64<sbyte> lower, Vector128<short> value)
Vector128<ushort> ExtractNarrowingSaturateUpper(Vector64<ushort> lower, Vector128<uint> value)
Vector128<uint> ExtractNarrowingSaturateUpper(Vector64<uint> lower, Vector128<ulong> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ExtractNarrowingSaturateUpperTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uqxtn2 v0.16b, v1.8h
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
28. ExtractNarrowingUpper
Vector128<byte> ExtractNarrowingUpper(Vector64<byte> lower, Vector128<ushort> value)
This method narrows each element in the upper half of value
vector to half the original width, stores the result in the upper half of result vector and returns the vector. The lower half of result vector contains values from lower
vector. As seen in below example, the result vector element’s size byte
is half as long as that of input parameter element’s size ushort
.
private Vector128<byte> ExtractNarrowingUpperTest(Vector64<byte> lower, Vector128<ushort> value)
{
return AdvSimd.ExtractNarrowingUpper(lower, value);
}
// lower = <125, 12, 13, 14, 15, 216, 117, 18>
// value = <500, 500, 12, 14, 257, 16, 17, 18>
// Result = <125, 12, 13, 14, 15, 216, 117, 18, 244, 244, 12, 14, 1, 16, 17, 18>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<short> ExtractNarrowingUpper(Vector64<short> lower, Vector128<int> value)
Vector128<int> ExtractNarrowingUpper(Vector64<int> lower, Vector128<long> value)
Vector128<sbyte> ExtractNarrowingUpper(Vector64<sbyte> lower, Vector128<short> value)
Vector128<ushort> ExtractNarrowingUpper(Vector64<ushort> lower, Vector128<uint> value)
Vector128<uint> ExtractNarrowingUpper(Vector64<uint> lower, Vector128<ulong> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ExtractNarrowingUpperTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
xtn2 v0.16b, v1.8h
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
29. ExtractVector128
Vector128<byte> ExtractVector128(Vector128<byte> upper, Vector128<byte> lower, byte index)
This method extracts the vector elements from upper
starting at index
(and hence should be less than the size of vector) and fills the result vector. Once the upper
vector runs out and there is room to fill in, elements from lower
elements are picked.
private Vector128<byte> ExtractVector128Test(Vector128<byte> upper, Vector128<byte> lower, byte index)
{
return AdvSimd.ExtractVector128(upper, lower, 5);
}
// upper = <11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26>
// lower = <31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 42, 44, 45, 46>
// index = 5
// Result = <16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 31, 32, 33, 34, 35>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<double> ExtractVector128(Vector128<double> upper, Vector128<double> lower, byte index)
Vector128<short> ExtractVector128(Vector128<short> upper, Vector128<short> lower, byte index)
Vector128<int> ExtractVector128(Vector128<int> upper, Vector128<int> lower, byte index)
Vector128<long> ExtractVector128(Vector128<long> upper, Vector128<long> lower, byte index)
Vector128<sbyte> ExtractVector128(Vector128<sbyte> upper, Vector128<sbyte> lower, byte index)
Vector128<float> ExtractVector128(Vector128<float> upper, Vector128<float> lower, byte index)
Vector128<ushort> ExtractVector128(Vector128<ushort> upper, Vector128<ushort> lower, byte index)
Vector128<uint> ExtractVector128(Vector128<uint> upper, Vector128<uint> lower, byte index)
Vector128<ulong> ExtractVector128(Vector128<ulong> upper, Vector128<ulong> lower, byte index)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ExtractVector128Test(System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte],ubyte):System.Runtime.Intrinsics.Vector128`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
ext v16.16b, v0.16b, v1.16b, #5
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
30. ExtractVector64
Vector64<byte> ExtractVector64(Vector64<byte> upper, Vector64<byte> lower, byte index)
This method extracts the vector elements from upper
starting at index
(and hence should be less than the size of vector) and fills the result vector. Once the upper
vector runs out and there is room to fill in, elements from lower
elements are picked. This method is same as ExtractVector128()
except it operates on Vector64<T>
.
private Vector64<byte> ExtractVector64Test(Vector64<byte> upper, Vector64<byte> lower, byte index)
{
return AdvSimd.ExtractVector64(upper, lower, 5);
}
// upper = <11, 12, 13, 14, 15, 16, 17, 18>
// lower = <21, 22, 23, 24, 25, 26, 27, 28>
// index = 5
// Result = <16, 17, 18, 21, 22, 23, 24, 25>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ExtractVector64(Vector64<short> upper, Vector64<short> lower, byte index)
Vector64<int> ExtractVector64(Vector64<int> upper, Vector64<int> lower, byte index)
Vector64<sbyte> ExtractVector64(Vector64<sbyte> upper, Vector64<sbyte> lower, byte index)
Vector64<float> ExtractVector64(Vector64<float> upper, Vector64<float> lower, byte index)
Vector64<ushort> ExtractVector64(Vector64<ushort> upper, Vector64<ushort> lower, byte index)
Vector64<uint> ExtractVector64(Vector64<uint> upper, Vector64<uint> lower, byte index)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ExtractVector64Test(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
ext v16.8b, v0.8b, v1.8b, #5
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
31. Floor
Vector64<float> Floor(Vector64<float> value)
This method rounds each element in the value
vector containing floating-point values to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, places the result in a vector and return the result vector. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.
private Vector64<float> FloorTest(Vector64<float> value)
{
return AdvSimd.Floor(value);
}
// value = <11.5, 12.5>
// Result = <11, 12>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> Floor(Vector128<float> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> Floor(Vector128<double> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FloorTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frintm v16.2s, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
32. FloorScalar
Vector64<double> FloorScalar(Vector64<double> value)
This method rounds each element in the value
vector containing floating-point values to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, places the result in a vector and return the result vector. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.
private Vector64<double> FloorScalarTest(Vector64<double> value)
{
return AdvSimd.FloorScalar(value);
}
// value = <11.5>
// Result = <11>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> FloorScalar(Vector64<float> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FloorScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frintm d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
33. FusedAddHalving
Vector64<byte> FusedAddHalving(Vector64<byte> left, Vector64<byte> right)
This method adds corresponding element values from the left
and right
vectors, shifts each result right one bit, places the truncated results in a vector, and returns the result vector.
private Vector64<byte> FusedAddHalvingTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.FusedAddHalving(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <16, 17, 18, 19, 20, 21, 22, 23>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> FusedAddHalving(Vector64<short> left, Vector64<short> right)
Vector64<int> FusedAddHalving(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> FusedAddHalving(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<ushort> FusedAddHalving(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> FusedAddHalving(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> FusedAddHalving(Vector128<byte> left, Vector128<byte> right)
Vector128<short> FusedAddHalving(Vector128<short> left, Vector128<short> right)
Vector128<int> FusedAddHalving(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> FusedAddHalving(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<ushort> FusedAddHalving(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> FusedAddHalving(Vector128<uint> left, Vector128<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FusedAddHalvingTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uhadd v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
34. FusedAddRoundedHalving
Vector64<byte> FusedAddRoundedHalving(Vector64<byte> left, Vector64<byte> right)
This method adds corresponding element values from the left
and right
vectors, shifts each result right one bit, places the rounded results in a vector, and returns the result vector.
private Vector64<byte> FusedAddRoundedHalvingTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.FusedAddRoundedHalving(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <16, 17, 18, 19, 20, 21, 22, 23>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> FusedAddRoundedHalving(Vector64<short> left, Vector64<short> right)
Vector64<int> FusedAddRoundedHalving(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> FusedAddRoundedHalving(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<ushort> FusedAddRoundedHalving(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> FusedAddRoundedHalving(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> FusedAddRoundedHalving(Vector128<byte> left, Vector128<byte> right)
Vector128<short> FusedAddRoundedHalving(Vector128<short> left, Vector128<short> right)
Vector128<int> FusedAddRoundedHalving(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> FusedAddRoundedHalving(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<ushort> FusedAddRoundedHalving(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> FusedAddRoundedHalving(Vector128<uint> left, Vector128<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FusedAddRoundedHalvingTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
urhadd v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
35. FusedMultiplyAdd
Vector64<float> FusedMultiplyAdd(Vector64<float> addend, Vector64<float> left, Vector64<float> right)
This method multiplies corresponding floating-point values in the vectors in the left
and right
vectors, adds the product to the vector elements of the addened
vector, and returns the accumulated result vector.
private Vector64<float> FusedMultiplyAddTest(Vector64<float> addend, Vector64<float> left, Vector64<float> right)
{
return AdvSimd.FusedMultiplyAdd(addend, left, right);
}
// addend = <11.5, 12.5>
// left = <21.5, 22.5>
// right = <11.5, 12.5>
// Result = <258.75, 293.75>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> FusedMultiplyAdd(Vector128<float> addend, Vector128<float> left, Vector128<float> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> FusedMultiplyAdd(Vector128<double> addend, Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FusedMultiplyAddTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmla v0.2s, v1.2s, v2.2s
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
36. FusedMultiplyAddByScalar
Vector64<float> FusedMultiplyAddByScalar(Vector64<float> addend, Vector64<float> left, Vector64<float> right)
This method multiplies floating-point value element at 0th
index of right
vector with elements in the left
vector, adds the product to the vector elements of the addened
vector, and returns the accumulated result vector.
private Vector64<float> FusedMultiplyAddByScalarTest(Vector64<float> addend, Vector64<float> left, Vector64<float> right)
{
return AdvSimd.Arm64.FusedMultiplyAddByScalar(addend, left, right);
}
// addend = <11.5, 12.5>
// left = <21.5, 22.5>
// right = <11.5, 12.5>
// Result = <258.75, 271.25>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> FusedMultiplyAddByScalar(Vector128<double> addend, Vector128<double> left, Vector64<double> right)
Vector128<float> FusedMultiplyAddByScalar(Vector128<float> addend, Vector128<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FusedMultiplyAddByScalarTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmla v0.2s, v1.2s, v2.s[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
37. FusedMultiplyAddBySelectedScalar
Vector64<float> FusedMultiplyAddBySelectedScalar(Vector64<float> addend, Vector64<float> left, Vector64<float> right, byte rightIndex)
This method multiplies floating-point value element at rightIndex
index of right
vector with elements in the left
vector, adds the product to the vector elements of the addened
vector, and returns the accumulated result vector.
private Vector64<float> FusedMultiplyAddBySelectedScalarTest(Vector64<float> addend, Vector64<float> left, Vector64<float> right, byte rightIndex)
{
return AdvSimd.Arm64.FusedMultiplyAddBySelectedScalar(addend, left, right, 0);
}
// addend = <11.5, 12.5>
// left = <21.5, 22.5>
// right = <11.5, 12.5>
// rightIndex = 0
// Result = <258.75, 271.25>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> FusedMultiplyAddBySelectedScalar(Vector64<float> addend, Vector64<float> left, Vector128<float> right, byte rightIndex)
Vector128<double> FusedMultiplyAddBySelectedScalar(Vector128<double> addend, Vector128<double> left, Vector128<double> right, byte rightIndex)
Vector128<float> FusedMultiplyAddBySelectedScalar(Vector128<float> addend, Vector128<float> left, Vector64<float> right, byte rightIndex)
Vector128<float> FusedMultiplyAddBySelectedScalar(Vector128<float> addend, Vector128<float> left, Vector128<float> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FusedMultiplyAddBySelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single],ubyte):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmla v0.2s, v1.2s, v2.s[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
38. FusedMultiplyAddNegatedScalar
Vector64<double> FusedMultiplyAddNegatedScalar(Vector64<double> addend, Vector64<double> left, Vector64<double> right)
This method multiplies the values of the left
and right
vector, negates the product, subtracts the value of theaddend
vector from the product, and returns the result.
private Vector64<double> FusedMultiplyAddNegatedScalarTest(Vector64<double> addend, Vector64<double> left, Vector64<double> right)
{
return AdvSimd.FusedMultiplyAddNegatedScalar(addend, left, right);
}
// addend = <100.5>
// left = <5.5>
// right = <15.5>
// Result = <-185.75>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> FusedMultiplyAddNegatedScalar(Vector64<float> addend, Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FusedMultiplyAddNegatedScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fnmadd d16, d1, d2, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
39. FusedMultiplyAddScalar
Vector64<double> FusedMultiplyAddScalar(Vector64<double> addend, Vector64<double> left, Vector64<double> right)
This method multiplies corresponding floating-point values in the vectors in the left
and right
vectors, adds the product to the vector elements of the addened
vector, and returns the accumulated result vector.
private Vector64<double> FusedMultiplyAddScalarTest(Vector64<double> addend, Vector64<double> left, Vector64<double> right)
{
return AdvSimd.FusedMultiplyAddScalar(addend, left, right);
}
// addend = <100.5>
// left = <5.5>
// right = <15.5>
// Result = <185.75>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> FusedMultiplyAddScalar(Vector64<float> addend, Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FusedMultiplyAddScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmadd d16, d1, d2, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
40. FusedMultiplyAddScalarBySelectedScalar
Vector64<double> FusedMultiplyAddScalarBySelectedScalar(Vector64<double> addend, Vector64<double> left, Vector128<double> right, byte rightIndex)
This method multiplies the vector elements in the left
vector by an element at rightIndex
of the right
vector, and accumulates the product to the corresponding vector elements of the addend
vector and returns the result vector.
private Vector64<double> FusedMultiplyAddScalarBySelectedScalarTest(Vector64<double> addend, Vector64<double> left, Vector128<double> right, byte rightIndex)
{
return AdvSimd.Arm64.FusedMultiplyAddScalarBySelectedScalar(addend, left, right, 0);
}
// addend = <11.5>
// left = <11.5>
// right = <11.5, 12.5>
// rightIndex = 0
// Result = <143.75>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> FusedMultiplyAddScalarBySelectedScalar(Vector64<float> addend, Vector64<float> left, Vector64<float> right, byte rightIndex)
Vector64<float> FusedMultiplyAddScalarBySelectedScalar(Vector64<float> addend, Vector64<float> left, Vector128<float> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FusedMultiplyAddScalarBySelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector128`1[Double],ubyte):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd16 -> d2 HFA(simd16)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmla d0, d1, v2.d[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
41. FusedMultiplySubtract
Vector64<float> FusedMultiplySubtract(Vector64<float> minuend, Vector64<float> left, Vector64<float> right)
This method multiplies corresponding floating-point values in the vectors in the left
and right
vectors, negates the product, adds the product to the corresponding vector element of minuend
vector, and returns the result.
private Vector64<float> FusedMultiplySubtractTest(Vector64<float> minuend, Vector64<float> left, Vector64<float> right)
{
return AdvSimd.FusedMultiplySubtract(minuend, left, right);
}
// minuend = <11.5, 12.5>
// left = <21.5, 22.5>
// right = <11.5, 12.5>
// Result = <-235.75, -268.75>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> FusedMultiplySubtract(Vector128<float> minuend, Vector128<float> left, Vector128<float> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> FusedMultiplySubtract(Vector128<double> minuend, Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FusedMultiplySubtractTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmls v0.2s, v1.2s, v2.2s
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
42. FusedMultiplySubtractByScalar
Vector64<float> FusedMultiplySubtractByScalar(Vector64<float> minuend, Vector64<float> left, Vector64<float> right)
This method multiplies floating-point value element at 0th
index of right
vector with elements in the left
vector, negates the product, adds the product to the vector elements of the minuend
vector, and returns the accumulated result vector.
private Vector64<float> FusedMultiplySubtractByScalarTest(Vector64<float> minuend, Vector64<float> left, Vector64<float> right)
{
return AdvSimd.Arm64.FusedMultiplySubtractByScalar(minuend, left, right);
}
// minuend = <11.5, 12.5>
// left = <21.5, 22.5>
// right = <11.5, 12.5>
// Result = <-235.75, -246.25>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> FusedMultiplySubtractByScalar(Vector128<double> minuend, Vector128<double> left, Vector64<double> right)
Vector128<float> FusedMultiplySubtractByScalar(Vector128<float> minuend, Vector128<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FusedMultiplySubtractByScalarTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmls v0.2s, v1.2s, v2.s[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
43. FusedMultiplySubtractBySelectedScalar
Vector64<float> FusedMultiplySubtractBySelectedScalar(Vector64<float> minuend, Vector64<float> left, Vector64<float> right, byte rightIndex)
This method multiplies floating-point value element at rightIndex
index of right
vector with elements in the left
vector, negates the product, adds the product to the vector elements of the minuend
vector, and returns the accumulated result vector.
private Vector64<float> FusedMultiplySubtractBySelectedScalarTest(Vector64<float> minuend, Vector64<float> left, Vector64<float> right, byte rightIndex)
{
return AdvSimd.Arm64.FusedMultiplySubtractBySelectedScalar(minuend, left, right, 0);
}
// minuend = <11.5, 12.5>
// left = <21.5, 22.5>
// right = <11.5, 12.5>
// rightIndex = 0
// Result = <-235.75, -246.25>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> FusedMultiplySubtractBySelectedScalar(Vector64<float> minuend, Vector64<float> left, Vector128<float> right, byte rightIndex)
Vector128<double> FusedMultiplySubtractBySelectedScalar(Vector128<double> minuend, Vector128<double> left, Vector128<double> right, byte rightIndex)
Vector128<float> FusedMultiplySubtractBySelectedScalar(Vector128<float> minuend, Vector128<float> left, Vector64<float> right, byte rightIndex)
Vector128<float> FusedMultiplySubtractBySelectedScalar(Vector128<float> minuend, Vector128<float> left, Vector128<float> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FusedMultiplySubtractBySelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single],ubyte):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmls v0.2s, v1.2s, v2.s[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
44. FusedMultiplySubtractNegatedScalar
Vector64<double> FusedMultiplySubtractNegatedScalar(Vector64<double> minuend, Vector64<double> left, Vector64<double> right)
This method multiplies the values of the left
and right
vectors, subtracts the value of the minuend
vector, and returns the result.
private Vector64<double> FusedMultiplySubtractNegatedScalarTest(Vector64<double> minuend, Vector64<double> left, Vector64<double> right)
{
return AdvSimd.FusedMultiplySubtractNegatedScalar(minuend, left, right);
}
// minuend = <11.5>
// left = <11.5>
// right = <11>
// Result = <115>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> FusedMultiplySubtractNegatedScalar(Vector64<float> minuend, Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FusedMultiplySubtractNegatedScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fnmsub d16, d1, d2, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
45. FusedMultiplySubtractScalar
Vector64<double> FusedMultiplySubtractScalar(Vector64<double> minuend, Vector64<double> left, Vector64<double> right)
This method multiplies the values of theleft
and right
vectors, negates the product, adds that to the value of the minuend
vector, and returns the result.
private Vector64<double> FusedMultiplySubtractScalarTest(Vector64<double> minuend, Vector64<double> left, Vector64<double> right)
{
return AdvSimd.FusedMultiplySubtractScalar(minuend, left, right);
}
// minuend = <11.5>
// left = <11.5>
// right = <11>
// Result = <-115>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> FusedMultiplySubtractScalar(Vector64<float> minuend, Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FusedMultiplySubtractScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmsub d16, d1, d2, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8